Introduction to data science in R

Lesson 7: Introduction to data visualization




Brian S. Evans, Ph.D.
Migratory Bird Center
Smithsonian Conservation Biology Institute



Setup for the lesson


# Load RCurl library:

library(RCurl)

# Load a source script:

script <-
  getURL(
    "https://raw.githubusercontent.com/bsevansunc/workshop_languageOfR/master/sourceCode_lesson6.R"
  )

# Evaluate then remove the source script:

eval(parse(text = script))

rm(script)

library(lubridate)

Today’s goals


Today’s data


The data frame:

birdMeasures
## # A tibble: 5,234 x 11
##            id  region   spp bandNumber   enc       date  mass  wing    tl
##         <chr>   <chr> <chr>      <chr> <chr>      <chr> <dbl> <dbl> <dbl>
##  1 g435-3576h Atlanta  NOCA 2641-63316     B 2014-05-06  36.7    92   100
##  2 c703-3173x Atlanta  NOCA 2641-63362     B 2014-06-12  40.4    93    98
##  3 b264-7018g Atlanta  CACH 2710-53995     B 2015-04-21   9.7    60    50
##  4 y107-5673o Atlanta  AMRO 1352-27606     B 2015-04-21  80.1   130    97
##  5 w113-8447n Atlanta  AMRO 1352-27609     B 2015-04-26  73.8   130    96
##  6 f364-6694j Atlanta  NOCA 2641-63899     B 2015-04-26  42.1    86   100
##  7 m960-6549h Atlanta  NOCA 2641-63900     B 2015-04-26  42.7    92   102
##  8 e424-8770v Atlanta  AMRO 1352-27610     B 2015-04-26  72.7   130    97
##  9 k126-5246c Atlanta  AMRO 1352-27614     B 2015-04-27  75.0   120    87
## 10 j492-4323t Atlanta  GRCA 2657-47401     B 2015-04-27  38.3    87    90
## # ... with 5,224 more rows, and 2 more variables: age <chr>, sex <chr>

Initiating a plot


ggplot(birdMeasures)

Aesthetics


Aesthetics describe mapping the value of some variable to an observable feature.

ggplot(birdMeasures, 
       aes(x = spp))

Geometries


A geometry plot element provides a visible representation of observations. They are called using the function geom_[geometry]. Geometries are frequently used include:

  • geom_bar: Bars for bar plots
  • geom_histogram: Histogram plot for observing distributions
  • geom_density: Density plot for observing distributions
  • geom_point: Point plot for observing raw data

Geometries


ggplot(birdMeasures, 
       aes(x = spp)) +
  geom_bar()

Geometries


Piping helps!

birdMeasures %>%
  ggplot(aes(x = spp)) +
  geom_bar()

Geometries


Piping helps!

birdMeasures %>%
  filter(spp != 'NOCA') %>%
  ggplot(aes(x = spp)) +
  geom_bar()

Exercise One:


The function geom_density can be used to display the density distribution of a vector. Using the aesthetic x = mass, display the distribution of Black-capped and Carolina chickadee mass measurements:

Exercise One:


The function geom_density can be used to display the density distribution of a vector. Using the aesthetic x = mass, display the distribution of Black-capped and Carolina chickadee mass measurements:

# Subset birdCounts to BCCH and CACH and plot density:

birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_density()

Exercise One:



Geometries: Adding arguments


birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram()

Geometries: Adding arguments


birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(binwidth = 1)

Geometries: Adding arguments


birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(bins = 20)

Geometries: Adding arguments


birdMeasures %>%
  filter(spp != 'NOCA') %>%
  ggplot(aes(x = spp)) +
  geom_bar(fill = 'gray')

Geometries: Adding arguments


birdMeasures %>%
  filter(spp != 'NOCA') %>%
  ggplot(aes(x = spp)) +
  geom_bar(fill = 'gray', 
           color = 'black')

Geometries: Adding arguments


birdMeasures %>%
  filter(spp != 'NOCA') %>%
  ggplot(aes(x = spp)) +
  geom_bar(fill = 'gray', 
           color = 'black',
           size = 0.7)

Exercise Two:


Modify your density plot from Exercise One:

  • Use the fill argument to fill your density shape with the color “gray”:
  • The argument alpha can be applied to a geometry to adjust its transparency. Adjust the density shape to alpha = 0.7

Exercise Two:


# Subset birdCounts to BCCH and CACH and plot density:

birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_density()

Exercise Two:


Geometries: Adding aesthetics


Aesthetics describe mapping the value of some variable to an observable feature.

birdMeasures %>%
  filter(spp != 'NOCA') %>%
  ggplot(aes(x = spp)) +
  geom_bar(aes(fill = region))

Geometries: Adding aesthetics


birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(aes(fill = sex),
                 bins = 20)

Geometries: Adding aesthetics


birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(aes(fill = sex),
                 bins = 20,
                 color = 'black')

Exercise Three:


Modify your density plot from Exercise Two. Use the fill argument of the function geom_density to assign a different fill color to females and males.

Exercise Three:


# Subset birdCounts to BCCH and CACH and plot density:

birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_density(aes(fill = sex),
               alpha = 0.7)

Exercise Three:


Facets


Faceting splits plots, by some variable, into multiple plots.

Facets


birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(aes(fill = sex), 
                 bins = 20,
                 color = 'black') +
  facet_wrap(~spp)

Facets


birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(aes(fill = sex), 
                 bins = 20,
                 color = 'black') +
  facet_wrap(~spp, nrow = 2)

Exercise Four:


Modify your density plot from Exercise Three. Use the facet_wrap function with the argument nrow = 2 to generate separate plots of Black-capped and Carolina chickadees.

Exercise Four:


birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_density(aes(fill = sex),
               alpha = 0.7) +
  facet_wrap(~spp, nrow = 2)

Exercise Four:


Labels


Labels describes the plot and axis titles.

Labels


birdMeasures %>%
  filter(spp != 'NOCA') %>%
  ggplot(aes(x = spp)) +
  geom_bar(aes(fill = region),
           color = 'black',
           size = .7) +
  labs(title = 'Birds banded and recaptured 2000-2017',
       x = 'Species',
       y = 'Count')

Labels


Labels


Piping can be used …

birdMeasures %>%
  filter(spp != 'NOCA') %>%
  mutate(spp = factor(
    spp,
    labels = c(
      'American robin',
      'Black-capped chickadee',
      'Carolina chickadee',
      'Gray catbird'
    )
  )) %>%
  ggplot(aes(x = spp)) +
   geom_bar(aes(fill = region),
           color = 'black',
           size = .7) +
  labs(title = 'Birds banded and recaptured 2000-2017',
       x = 'Species',
       y = 'Count')

Labels


Labels


It’s a good time to assign names!

birdCaptures_basicPlot <- birdMeasures %>%
  filter(spp != 'NOCA') %>%
  mutate(spp = factor(
    spp,
    labels = c(
      'American robin',
      'Black-capped chickadee',
      'Carolina chickadee',
      'Gray catbird'
    )
  )) %>%
  ggplot(aes(x = spp)) +
   geom_bar(aes(fill = region),
           color = 'black',
           size = .7) +
  labs(title = 'Birds banded and recaptured 2000-2017',
       x = 'Species',
       y = 'Count')

Labels


birdCaptures_basicPlot

Exercise Five:


Modify the density plot you created in Exercise Four:

  • Using the piping method, change the names “BCCH” and “CACH” to “Black-capped” and “Carolina”
  • Add the title “Mass of Carolina and Black-capped chickadees and capitalize the x and y axis titles
  • Assign the name “massDensity” to the plot

Exercise Five:


# Labels for massDensity:

massDensity <- birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  mutate(spp = factor(
    spp,
    labels = c(
      'Black-capped',
      'Carolina'
    )
  )) %>%
  ggplot(aes(x = mass)) +
  geom_density(aes(fill = sex),
               alpha = 0.7) +
  facet_wrap(~spp, nrow = 2) +
  labs(title = "Mass of Carolina and Black-capped chickadees",
       x = 'Mass', 
       y = 'Density')

massDensity

Exercise Five:


Scaling axes


Changing the scale of an axis changes the range of numbers and the names and locations of tick marks.

Scaling axes


birdCaptures_basicPlot +
  scale_y_continuous(expand = c(0,0))

Scaling axes


Scaling axes


birdMeasures %>%
  filter(spp != 'NOCA') %>%
  group_by(spp) %>%
  summarize(n = n())
## # A tibble: 4 x 2
##     spp     n
##   <chr> <int>
## 1  AMRO   671
## 2  BCCH   508
## 3  CACH   797
## 4  GRCA  1395

Scaling axes


birdCaptures_basicPlot +
  scale_y_continuous(expand = c(0, 0),
                     limits = c(0, 1500))

Scaling axes


Scaling axes


birdCaptures_basicPlot +
  scale_y_continuous(expand = c(0, 0),
                     limits = c(0, 1500),
                     breaks = seq(0, 1500, by = 250))

Scaling axes


Exercise Six:


Plot massDensity. Use the expand, limits, and breaks arguments of the function scale_y_continuous to scale the y-axis such that the scale ranges from 0 to 0.7 and breaks occur at intervals of 0.1.

Exercise Six:


massDensity +
  scale_y_continuous(expand = c(0, 0),
                     limits = c(0, 0.8),
                     breaks = seq(0, 0.8, by = 0.1))

Exercise Six:


Colors


The default colors of ggplot are pretty ugly. Luckily you can modify in an infinite number of ways!

birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(aes(fill = sex), 
                 bins = 20,
                 color = 'black') +
  facet_wrap(~spp, nrow = 2) +
  scale_fill_manual(values = c('blue', 'red'))

Colors


Colors


Color-picker apps can be a great way to find colors that you like on the internet.

Colors


Using Team Zissou’s hat and shirt color:

birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(aes(fill = sex), 
                 bins = 20,
                 color = 'black') +
  facet_wrap(~spp, nrow = 2) +
  scale_fill_manual(values = c('#9EB8C5', '#F32017'))

Colors


Colors


You can hunt around to find colors that you like and then save your palette for use later:

zPalette <- c('#9EB8C5', '#F32017')

birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(aes(fill = sex), 
                 bins = 20,
                 color = 'black') +
  facet_wrap(~spp, nrow = 2) +
  scale_fill_manual(values = zPalette)

Colors


Exercise Seven:


Modify the density plot you created in Exercise Six. Use scale_fill_manual to set custom fill colors.

Exercise Seven:


# Colors for massDensity:

massDensity +
  scale_y_continuous(expand = c(0, 0),
                     limits = c(0, 0.8),
                     breaks = seq(0, 0.8, by = 0.1)) +
  scale_fill_manual(values =  c('#9EB8C5', '#F32017'))

Exercise Seven:


Legends


Legends can be modified in a number of ways. One method to do so is to modify the data frame coming into the plotting functions:

birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  mutate(sex = factor(sex,
                      labels = c('Female','Male'))) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(aes(fill = sex), 
                 bins = 20,
                 color = 'black') +
  facet_wrap(~spp, nrow = 2) +
  scale_fill_manual(values = c('#9EB8C5', '#F32017'))

Legends


Legends


We can also use the scale_fill_manual function from above to modify the legend by specifying the name and label attributes:

birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(aes(fill = sex), 
                 bins = 20,
                 color = 'black') +
  facet_wrap(~spp, nrow = 2) +
  scale_fill_manual(values = c('#9EB8C5', '#F32017'), 
                    name = 'Sex', 
                    labels = c('Female', 'Male'))

Legends


Exercise Eight:


Modify the density plot you created in Exercise Seven. Use scale_fill_manual to set the legend title and labels.

Exercise Eight:


# Colors for massDensity:
massDensity +
  scale_y_continuous(expand = c(0, 0),
                     limits = c(0, 0.8),
                     breaks = seq(0, 0.8, by = 0.1)) +
  scale_fill_manual(values = zPalette,
                    name = 'Sex', 
                    labels = c('Female', 'Male')) 

Exercise Eight:


Themes


A theme describes many of the visual elements of a plot.

Themes are controlled by elements, including:

  • element_blank: A blank element
  • element_rect: A rectangle element
  • element_text: A text element
  • element_line: A line element

Themes


Before exploring themes, let’s take a moment to assign names to the current versions of our plots:

histogram2Theme <- birdMeasures %>%
  filter(spp %in% c('BCCH', 'CACH')) %>%
  mutate(spp = factor(
    spp,
    labels = c(
      'Black-capped',
      'Carolina'
    )
  )) %>%
  ggplot(aes(x = mass)) +
  geom_histogram(aes(fill = sex), 
                 bins = 20,
                 color = 'black') +
  scale_y_continuous(expand = c(0, 0),
                     limits = c(0, 150),
                     breaks = seq(0, 150, by = 25)) +
  facet_wrap(~spp, nrow = 2) +
  scale_fill_manual(values = c('#9EB8C5', '#F32017'), 
                    name = 'Sex', 
                    labels = c('Female', 'Male'))

Themes


Before exploring themes, let’s take a moment to assign names to the current versions of our plots:

density2Theme <- massDensity +
  scale_y_continuous(expand = c(0, 0),
                     limits = c(0, 0.8),
                     breaks = seq(0, 0.8, by = 0.1)) +
  scale_fill_manual(values = zPalette,
                    name = 'Sex', 
                    labels = c('Female', 'Male')) 

Themes


Remove gray panel background using element_rect:

histogram2Theme +
  labs(title = 'Mass of Carolina and Black-capped chickadees',
       x = 'Mass',
       y = 'Density',
       fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white')
  )

Themes


Themes


Change panel lines using element_line:

histogram2Theme +
  labs(title = 'Mass of Carolina and Black-capped chickadees',
     x = 'Mass',
     y = 'Density',
     fill = 'Sex') +
  theme(
  panel.background = element_rect(fill = 'white'),
  panel.grid.major = element_line(color = 'gray80', size = .2),
  )

Themes


Themes


Modify the strip background using element_rect:

histogram2Theme +
  labs(title = 'Mass of Carolina and Black-capped chickadees',
       x = 'Mass',
       y = 'Density',
       fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'gray80', size = .2),
    strip.background = element_rect(fill = 'white')
  )

Themes


Themes


Modify the y axis lines using element_line:

histogram2Theme +
  labs(title = 'Mass of Carolina and Black-capped chickadees',
       x = 'Mass',
       y = 'Density',
       fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'gray80', size = .2),
    axis.line = element_line(color = 'black', size = .5),
    strip.background = element_rect(fill = 'white')
  )

Themes


Themes


Remove the legend title using element_blank:

histogram2Theme +
  labs(title = 'Mass of Carolina and Black-capped chickadees',
       x = 'Mass',
       y = 'Density',
       fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'gray80', size = .2),
    axis.line = element_line(color = 'black', size = .5),
    strip.background = element_rect(fill = 'white'),
    legend.title = element_blank()
  )

Themes


Themes


Change the size of tick mark text using axis.text and element_text:

histogram2Theme +
  labs(title = 'Mass of Carolina and Black-capped chickadees',
       x = 'Mass',
       y = 'Density',
       fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'gray80', size = .2),
    axis.line = element_line(color = 'black', size = .5),
    strip.background = element_rect(fill = 'white'),
    legend.title = element_blank(),
    axis.text = element_text(size = 12)
  )

Themes


Themes


Make the axis titles bigger we use axis.title and element_text:

histogram2Theme +
  labs(title = 'Mass of Carolina and Black-capped chickadees',
       x = 'Mass',
       y = 'Density',
       fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'gray80', size = .2),
    axis.line = element_line(color = 'black', size = .5),
    strip.background = element_rect(fill = 'white'),
    legend.title = element_blank(),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 18)
  )

Themes


Themes


Make the facet labels bigger we use axis.title and element_text:

histogram2Theme +
  labs(title = 'Mass of Carolina and Black-capped chickadees',
       x = 'Mass',
       y = 'Density',
       fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'gray80', size = .2),
    axis.line = element_line(color = 'black', size = .5),
    strip.background = element_rect(fill = 'white'),
    legend.title = element_blank(),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 18),
    strip.text = element_text(size = 18)
  )

Themes


Themes


Make the plot title larger using plot.title and element_text:

histogram2Theme +
  labs(title = 'Mass of Carolina and Black-capped chickadees',
       x = 'Mass',
       y = 'Density',
       fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'gray80', size = .2),
    axis.line = element_line(color = 'black', size = .5),
    strip.background = element_rect(fill = 'white'),
    legend.title = element_blank(),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 18),
    strip.text = element_text(size = 18),
    plot.title = element_text(size = 22)
  )

Themes


Themes


Add a margin between the plot and title (see ?margin):

massPlot +
  labs(title = 'Mass of Carolina and\nBlack-capped chickadees',
       x = 'Mass',
       y = 'Density',
       fill = 'Sex') +
  theme(
    panel.background = element_rect(fill = 'white'),
    panel.grid.major = element_line(color = 'gray80', size = .2),
    axis.line = element_line(color = 'black', size = .5),
    strip.background = element_rect(fill = 'white'),
    legend.title = element_blank(),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 18),
    strip.text = element_text(size = 18),
    plot.title = element_text(size = 22, margin = margin(b = 40))
  )

Themes


Exercise Nine:


Make your density plot as pretty as possible using themes!